Table of contents

ExpHunterSuite: Differential Expression Report

Data quality control (QC)

Correlation between samples

Here we show scatterplots comparing expression levels for all genes between the different samples, for i) all controls, ii) all treatment samples and iii) for all samples together. These plots will only be produced when the total number of samples to compare within a group is less than or equal to 10.

Correlation between control samples

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Correlation between treatment samples

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Correlation between samples: All vs all replicates

Correlation coefficients tend to be slightly higher between replicates from the same group than between replicates from different groups. If this is not the case, it may indicate mislabelling or other potential issues.

Heatmap and clustering showing correlation between replicates

BROWN: higher correlation; YELLOW: lower correlation

Overview

This section encompases a general overview of the dimensionality reduction analysis applied.
Inertia
Graphical representation of Principal Components (PCs). The bars represent the percentage of total variance that summarize each PC.
The line measures the percentage of total variance accumulated in PCs. The color distinguishes between significant or no significant PCs.
Only significant PCs will be considered in the following plots.
Sample coordinates
This plot represents the coordinates of the samples on the two first Principal Components (Dim1 and Dim2). The percentage of explained variance is given between brackets.
This is a simplified plot of the samples displayed in the two main PCs. The color of the samples indicates their experimental condition.

Categorical variables

This section explore the relationship between supplementary categorical variables (-S option), samples and PCs.

Coordinates of categories
This plot represent the coordinates of the samples and supplementary categories on the two first Principal Components (Dim1 and Dim2). The samples and categories are represented in black and purple, respectively. The percentage of explained variance is given between brackets.
Comparison of significant dimensions
This plot compare the position of samples and their distribution in the significant PCs. The color differenciate between the control (red) and treat (blue) samples.
Association between qualitative variables and PCs
This plot represent the asociation between the qualitative variables and PCs. The association value is the R2 value. The flags represent the significance measured with an analysis of variance where *: 0.01 < P < 0.05; ** 0.001 < P < 0.1; *** P < 0.001.
Association between categories and PCs
This plot represent the asociation between the categories of the qualitative variables and PCs. The association value is the mean coordinates of the samples within categories in PCs. The flags represent the significance measured with a Student's t test where *: 0.01 < P < 0.05; ** 0.001 < P < 0.1; *** P < 0.001.
Any supplementary quantitative variable was included on this analysis.
TOP active quantitative variables
This table summarizes the top 10 active quantitative variables associated with PCs.
factor dimension correlation p.value
ENSG00000261324 PC1 0.996537074342405 1.35286857074731e-06
ENSG00000231074 PC1 0.991782918597745 1.17039743226132e-05
ENSG00000269929 PC1 0.991738492593449 1.1862528345794e-05
ENSG00000273486 PC1 0.991448908363666 1.29275240755007e-05
ENSG00000169752 PC1 0.990959441883737 1.48537236215434e-05
ENSG00000203668 PC1 0.990957779747393 1.4860538554611e-05
ENSG00000221792 PC1 0.990545805590507 1.66077964807736e-05
ENSG00000168273 PC1 0.9902614909604 1.78819759595796e-05
ENSG00000163682 PC1 0.989512030138842 2.15148306712712e-05
ENSG00000259116 PC1 0.989079790052802 2.37950100784874e-05
ENSG00000163939 PC2 0.990125967490653 1.850926071714e-05
ENSG00000118257 PC2 0.981273852068622 9.1243976305645e-05
ENSG00000171365 PC2 0.979973068877129 0.000107848581579049
ENSG00000080546 PC2 0.979487040838471 0.000114481510266787
ENSG00000213614 PC2 0.975647871141264 0.000175431183619448
ENSG00000163291 PC2 0.97389014724131 0.000208623399997438
ENSG00000121741 PC2 0.972638000372638 0.000234383772842269
ENSG00000255081 PC2 0.971283844771989 0.000264274702516549
ENSG00000138246 PC2 0.970177702776333 0.000290290383970345
ENSG00000006459 PC2 0.967726647084045 0.000353194368961308

HCPC

This section explore the groups of samples based on Hierarchical Clustering on Principal Components (HCPC) and the relationship of the clusters with supplementary variables. For the HCPC only the 2 first relevant PCs where used.
Hierarchical clustering of individuals2 significant PCs
This plot represent the dendrogram of HCPC of the individuals. The groups of individuals have different colors and the inertia plot is showed at the top right.
HCPC coordinates
This plot represent the coordinates of the samples in the two main Principal Components. The percentage of summarized variance is showed between brackets. The samples are colored by their HCPC cluster.
Relationship between HCPC clusters and supplementary qualitative variables
Fisher's exact test is computed between clusters and experimental treats. Fisher's exact test P values and FDR are showed.
None of the clusters was significantly associated with any experimental group

Visualizing normalization results

These boxplots show the distributions of count data before and after normalization (shown for normalization methoddefault

Representation of cpm unfiltered data

Before normalization

After normalization

Count metrics by sample ranks

Sample rank versus total counts

Sample rank: the position a sample holds after sorting by total counts.

Statistics of expressed genes

Samples are ranked by total expressed genes. Union of expressed genes represents the cumulative total expressed genes (sum of all genes expressed in any sample up to current sample, expected to increase with sample rank). Intersection of expressed genes represents the cumulative intersection of expressed genes (sum of genes expressed in every sample up to current sample, expected to decrease with sample rank)

Mean count distribution by filter

This plot represents the mean counts distribution per gene, classified by filters

Gene counts variance distribution

Variance of gene counts across samples are represented. Genes with lower variance than selected threshold (dashed grey line) were filtered out.

Samples differences by all counts normalized

All counts were normalizated by default (see options below) algorithm. These counts have been scaled by log10 and plotted in a heatmap.

Sample differences by total normalized counts

Percentages of reads per sample mapping to the most highly expressed genes

rownames ENSG00000276168 ENSG00000198804 ENSG00000275395 ENSG00000198886 ENSG00000010327
control_1 5.92 4.922 1.952 2.282 1.301
control_2 5.917 4.621 2.596 2.172 1.862
control_4 5.327 4.477 2.489 2.121 2.034
opg_1 8.064 2.121 1.652 1.03 1.352
opg_2 7.143 3.126 1.436 1.512 1.416
opg_3 5.585 1.661 4.406 0.803 1.09
opg_4 5.775 1.675 3.788 0.84 1.397

Details of input data

First group of samples (to be referred to as control in the rest of the report)

Sample Names:
control_1
control_2
control_4

Second group of samples (to be referred to as treatment in the rest of the report)

Sample Names:
opg_1
opg_2
opg_3
opg_4

DEgenes Hunter results

Gene classification by DEgenes Hunter

DEgenes Hunter uses multiple DE detection packages to analyse all genes in the input count table and labels them accordingly.

Note: A positive log fold change shows higher expression in the treatment group; a negative log fold change represents higher expression in the control group.

This barplot shows the total number of genes passing each stage of analysis - from the total number of genes in the input table of counts, to the genes surviving the expression filter, to the genes detected as DE by one package, to the genes detected by at least 2 packages.

Package DEG detection stats

This is the Venn Diagram of all possible DE genes (DEGs) according to at least one of the selected DE detection packages

Plot showing variability between different DEG detection methods in terms of logFC calculation

This graph shows logFC calculated (y-axis) for each package (points) and gene (x-axis). Only genes with variability over 0.01 will be plotted. This representation allows to user to observe the behaviour of each DE package and see if one of them has atypical results.

If there are no genes showing sufficient variance in estimated logFC accross methods, no plot will be produced and a warning message will be given.

FDR gene-wise benchmarking

Benchmark of false positive calling: Boxplot of FDR values among all genes with an FDR <= 0.05 in at least one DE detection package

FDR Volcano Plot showing log 2 fold change vs. FDR

The red horizontal line represents the FDR threshold, which has been set to 0.05
The black lines represent other values.

Overview

This section encompases a general overview of the dimensionality reduction analysis applied.
Inertia
Graphical representation of Principal Components (PCs). The bars represent the percentage of total variance that summarize each PC.
The line measures the percentage of total variance accumulated in PCs. The color distinguishes between significant or no significant PCs.
Only significant PCs will be considered in the following plots.
Sample coordinates
This plot represents the coordinates of the samples on the two first Principal Components (Dim1 and Dim2). The percentage of explained variance is given between brackets.
This is a simplified plot of the samples displayed in the two main PCs. The color of the samples indicates their experimental condition.

Categorical variables

This section explore the relationship between supplementary categorical variables (-S option), samples and PCs.

Coordinates of categories
This plot represent the coordinates of the samples and supplementary categories on the two first Principal Components (Dim1 and Dim2). The samples and categories are represented in black and purple, respectively. The percentage of explained variance is given between brackets.
Comparison of significant dimensions
This plot compare the position of samples and their distribution in the significant PCs. The color differenciate between the control (red) and treat (blue) samples.
Association between qualitative variables and PCs
This plot represent the asociation between the qualitative variables and PCs. The association value is the R2 value. The flags represent the significance measured with an analysis of variance where *: 0.01 < P < 0.05; ** 0.001 < P < 0.1; *** P < 0.001.
Association between categories and PCs
This plot represent the asociation between the categories of the qualitative variables and PCs. The association value is the mean coordinates of the samples within categories in PCs. The flags represent the significance measured with a Student's t test where *: 0.01 < P < 0.05; ** 0.001 < P < 0.1; *** P < 0.001.
Any supplementary quantitative variable was included on this analysis.
TOP active quantitative variables
This table summarizes the top 10 active quantitative variables associated with PCs.
factor dimension correlation p.value
ENSG00000261324 PC1 0.996537074342405 1.35286857074731e-06
ENSG00000231074 PC1 0.991782918597745 1.17039743226132e-05
ENSG00000269929 PC1 0.991738492593449 1.1862528345794e-05
ENSG00000273486 PC1 0.991448908363666 1.29275240755007e-05
ENSG00000169752 PC1 0.990959441883737 1.48537236215434e-05
ENSG00000203668 PC1 0.990957779747393 1.4860538554611e-05
ENSG00000221792 PC1 0.990545805590507 1.66077964807736e-05
ENSG00000168273 PC1 0.9902614909604 1.78819759595796e-05
ENSG00000163682 PC1 0.989512030138842 2.15148306712712e-05
ENSG00000259116 PC1 0.989079790052802 2.37950100784874e-05
ENSG00000163939 PC2 0.990125967490653 1.850926071714e-05
ENSG00000118257 PC2 0.981273852068622 9.1243976305645e-05
ENSG00000171365 PC2 0.979973068877129 0.000107848581579049
ENSG00000080546 PC2 0.979487040838471 0.000114481510266787
ENSG00000213614 PC2 0.975647871141264 0.000175431183619448
ENSG00000163291 PC2 0.97389014724131 0.000208623399997438
ENSG00000121741 PC2 0.972638000372638 0.000234383772842269
ENSG00000255081 PC2 0.971283844771989 0.000264274702516549
ENSG00000138246 PC2 0.970177702776333 0.000290290383970345
ENSG00000006459 PC2 0.967726647084045 0.000353194368961308

HCPC

This section explore the groups of samples based on Hierarchical Clustering on Principal Components (HCPC) and the relationship of the clusters with supplementary variables. For the HCPC only the 2 first relevant PCs where used.
Hierarchical clustering of individuals2 significant PCs
This plot represent the dendrogram of HCPC of the individuals. The groups of individuals have different colors and the inertia plot is showed at the top right.
HCPC coordinates
This plot represent the coordinates of the samples in the two main Principal Components. The percentage of summarized variance is showed between brackets. The samples are colored by their HCPC cluster.
Relationship between HCPC clusters and supplementary qualitative variables
Fisher's exact test is computed between clusters and experimental treats. Fisher's exact test P values and FDR are showed.
None of the clusters was significantly associated with any experimental group

DEgenes Hunter differential expression analysis results can be found in file Common_results/hunter_results_table.txt

DE detection package-specific results

Various plots specific to each package are shown below:

The effective library size is the factor used by DESeq2 normalization algorithm for each sample. The effective library size must be dependent of raw library size.

DESeq2 normalization effects

This plot compares the effective library size with raw library size

The effective library size is the factor used by DESeq2 normalization algorithm for each sample. The effective library size must be dependent of raw library size.

DESeq2 MA plot

This is the MA plot from DESeq2 package

In DESeq2, the MA-plot (log ratio versus abundance) shows the log2 fold changes are attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted Pvalue is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down. A table containing the DESeq2 DEGs is provided: in Results\_DESeq2/DEgenes\_DESEq2.txt A table containing the DESeq2 normalized counts is provided in Results\_DESeq2/Normalized\_counts\_DESEq2.txt

Differences between samples by PREVALENT DEGs normalized counts

Counts of prevalent DEGs were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

edgeR MA plot

This is the MA plot from package edgeR

Differential gene expression data can be visualized as MA-plots (log ratio versus abundance) where each dot represents a gene. The differentially expressed genes are colored red and the non-differentially expressed ones are colored black. A table containing the edgeR DEGs is provided in Results\_edgeR/DEgenes\_edgeR.txt A table containing the edgeR normalized counts is provided in Results\_edgeR/Normalized\_counts\_edgeR.txt

Detailed package results comparison

This is an advanced section that allows comparing the output of packages unadjusted for DE analysis. The data shown here do not necessarily reflect biological impact.

P-value distributions

Distributions of p-values, unadjusted and adjusted for multiple testing (FDR)

FDR Correlations

## These last two modules have not been used in years, literally. They might break. # #

Values of options used to run DEGenesHunter

First column contains the option names; second column contains the given values for each option in this run.

Values of options used to run DEGenesHunter

First column contains the option names; the second contains the given values for each option in this run
opt
input_file /Users/marmtnez/Desktop/Master_Bioinfo/TFM/Files/final_counts.txt
pseudocounts FALSE
reads 2
count_var_quantile 0
minlibraries 2
filter_type separate
output_files /Users/marmtnez/Desktop/Master_Bioinfo/TFM/Results/degenes/ctrl_vs_opg_degenes_fc1
p_val_cutoff 0.05
lfc 1
modules DE
minpack_common 2
target_file /Users/marmtnez/Desktop/Master_Bioinfo/TFM/Files/ctrl_vs_opg_target.txt
model_variables
numerics_as_factors FALSE
string_factors
numeric_factors
WGCNA_memory 5000
WGCNA_norm_method DESeq2
WGCNA_deepsplit 2
WGCNA_min_genes_cluster 20
WGCNA_detectcutHeight 0.995
WGCNA_mergecutHeight 0.25
WGCNA_all FALSE
WGCNA_blockwiseNetworkType signed
WGCNA_blockwiseTOMType signed
WGCNA_minCoreKME 0.7
WGCNA_minKMEtoStay 0.5
WGCNA_corType pearson
multifactorial
help FALSE